AITopics | constant regret

No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation

Neural Information Processing SystemsApr-25-2026, 06:01:20 GMT

We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convexconcave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the gametheoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation - that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile.

artificial intelligence, machine learning, noise, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
North America > United States (0.28)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

13f17f74ec061f1e3e231aca9a43ff23-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 17:30:30 GMT

artificial intelligence, constraint, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

Add feedback

0463ec87d0ac1e98a6cbe3d95d4e3e35-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 08:55:31 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

0463ec87d0ac1e98a6cbe3d95d4e3e35-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 08:55:27 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Data Science (0.68)

Add feedback

Constant Regret, Generalized Mixability, and Mirror Descent

Neural Information Processing SystemsMar-16-2026, 23:32:27 GMT

We consider the setting of prediction with expert advice; a learner makes predictions by aggregating those of a group of experts. Under this setting, and for the right choice of loss function and ``mixing'' algorithm, it is possible for the learner to achieve a constant regret regardless of the number of prediction rounds. For example, a constant regret can be achieved for \emph{mixable} losses using the \emph{aggregating algorithm}. The \emph{Generalized Aggregating Algorithm} (GAA) is a name for a family of algorithms parameterized by convex functions on simplices (entropies), which reduce to the aggregating algorithm when using the \emph{Shannon entropy} $\operatorname{S}$. For a given entropy $\Phi$, losses for which a constant regret is possible using the \textsc{GAA} are called $\Phi$-mixable. Which losses are $\Phi$-mixable was previously left as an open question. We fully characterize $\Phi$-mixability and answer other open questions posed by \cite{Reid2015}. We show that the Shannon entropy $\operatorname{S}$ is fundamental in nature when it comes to mixability; any $\Phi$-mixable loss is necessarily $\operatorname{S}$-mixable, and the lowest worst-case regret of the \textsc{GAA} is achieved using the Shannon entropy. Finally, by leveraging the connection between the \emph{mirror descent algorithm} and the update step of the GAA, we suggest a new \emph{adaptive} generalized aggregating algorithm and analyze its performance in terms of the regret bound.

artificial intelligence, machine learning, proceedings, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Achieving Constant Regret in Linear Markov Decision Processes

Neural Information Processing SystemsFeb-18-2026, 14:20:40 GMT

We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for misspec-ified linear Markov decision processes (MDPs) where both the transition kernel and the reward function can be approximated by some linear function up to mis-specification level ζ . At the core of Cert-LSVI-UCB is an innovative certified estimator, which facilitates a fine-grained concentration analysis for multi-phase value-targeted regression, enabling us to establish an instance-dependent regret bound that is constant w.r.t. the number of episodes.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: